CAPANIC: A Parallel Tree N-body code for inhomogeneous clusters of processors
نویسندگان
چکیده
We have implemented a parallel version of the Barnes-Hut 3-D N-body tree algorithm under PVM 3.2.5, adopting an SPMD paradigm. We parallelize the problem by decomposing the physical domain by means of the Orthogonal Recursive Bisection octtree scheme suggested by Salmon (1991), but we modify the original hypercube communication pattern into an incomplete hypercube, which is more suitable for a generic inhomogenous cluster architecture. We address dynamical load balancing by assigning different ”weights” to the spawned tasks according to the dynamically changing workloads of each task. The weights are determined by monitoring the local platforms where the tasks are running and estimating the performance of each task. The monitoring scheme is flexible and allows us to address at the same time cluster and intrinsic sources of load imbalance. We then show measurements of the performance of our code on a test case of astrophysical interest in order to test the performance of our load-balancing scheme.
منابع مشابه
Speeding up the Stress Analysis of Hollow Circular FGM Cylinders by Parallel Finite Element Method
In this article, a parallel computer program is implemented, based on Finite Element Method, to speed up the analysis of hollow circular cylinders, made from Functionally Graded Materials (FGMs). FGMs are inhomogeneous materials, which their composition gradually varies over volume. In parallel processing, an algorithm is first divided to independent tasks, which may use individual or shared da...
متن کاملA Message-Passing Distributed Memory Parallel Algorithm for a Dual-Code Thin Layer, Parabolized Navier-Stokes Solver
In this study, the results of parallelization of a 3-D dual code (Thin Layer, Parabolized Navier-Stokes solver) for solving supersonic turbulent flow around body and wing-body combinations are presented. As a serial code, TLNS solver is very time consuming and takes a large part of memory due to the iterative and lengthy computations. Also for complicated geometries, an exceeding number of grid...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملFLY: MPI-2 high resolution code for LSS cosmological simulations
Cosmological simulations of structures and galaxies formations have played a fundamental role in the study of the origin, formation and evolution of the Universe. These studies improved enormously with the use of supercomputers and parallel systems and, recently, grid based systems and Linux clusters. Now we present the new version of the tree N-body parallel code FLY that runs on a PC Linux Cl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994